Workshop Exercise: Emissions Data - Table, Map, and Chart

Introduction

In this exercise, you will work with a CO2 emissions dataset downloaded from Gapminder and produce a report with three tabs: a data table, a line chart, and a choropleth map.

The goal is to roughly replicate the Our World in Data visualization page on consumption-based CO2 emissions.

Be sure to view that page to get an idea of the final product.

Setup

  • You should have forked and cloned this repository to your local machine.

  • Now, create and select a virtual environment in VSCode.

  • Install the following packages:

    • pandas
    • plotly
    • itables
    • ipykernel
    • jupyter
    • country_converter
  • Download the data from Gapminder by selecting: Environment > Emissions > CO2 Total emissions, then downloading the CSV file into a data folder in your repository.

Data Import

Run the following code to import the necessary libraries:

import pandas as pd
import numpy as np
import plotly.express as px
from itables import show
import country_converter as coco

Load in your dataset from gapminder below. View it in your data viewer to get an idea of the structure.

pd.set_option("mode.copy_on_write", True)
# Load the data
emissions = pd.read_csv("data\co2_cons.csv")
emissions
<>:2: SyntaxWarning:

invalid escape sequence '\c'

<>:2: SyntaxWarning:

invalid escape sequence '\c'

C:\Users\aijay\AppData\Local\Temp\ipykernel_27220\2093159210.py:2: SyntaxWarning:

invalid escape sequence '\c'
country 1800 1801 1802 1803 1804 1805 1806 1807 1808 ... 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
0 Afghanistan 0.002 0.002 0.002 0.002 0.002 0.002 0.003 0.003 0.003 ... 8.84 8.290 8.850 8.480 8.820 9.320 9.850 10.200 11.2 11.7
1 Angola 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 0.015 ... 33.4 44.600 34.300 34.500 34.600 35.100 37.100 37.500 41.5 43.6
2 Albania 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ... 6.54 6.480 5.880 5.800 6.120 5.980 5.890 5.740 6.06 5.98
3 Andorra 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ... 0.421 0.418 0.428 0.441 0.463 0.459 0.462 0.438 0.472 0.472
4 UAE 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 ... 236 237.000 241.000 240.000 216.000 214.000 196.000 183.000 194 199
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
189 Samoa 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ... 0.208 0.219 0.247 0.260 0.268 0.277 0.291 0.281 0.307 0.318
190 Yemen 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 ... 26.8 26.000 13.700 11.000 10.900 11.000 11.500 11.700 12.8 13.3
191 South Africa 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 ... 334 334.000 322.000 320.000 314.000 311.000 330.000 301.000 302 310
192 Zambia 0.191 0.194 0.198 0.202 0.206 0.210 0.214 0.218 0.222 ... 7.78 8.810 8.430 7.890 8.210 8.340 8.950 7.350 8.09 8.49
193 Zimbabwe 0.065 0.065 0.065 0.065 0.066 0.066 0.066 0.066 0.067 ... 12.3 12.900 13.500 11.900 10.400 12.500 12.000 11.600 12.6 13.1

194 rows × 224 columns

Initial Cleaning

In this dataset, some values are given in thousands, with a “k” used to represent the thousands. This will cause problems when we try to make these columns numeric. So we need to clean this. We’ll do this for you, but pay close attention as you might need it for your final project.

First, let’s see the issue:

emissions.query("country == 'China'")[["country",  "2020", "2021", "2022"]]
country 2020 2021 2022
32 China 9980.0 10.6k 10.5k

Notice the letter “k” at the end of “10.6k” as an example.

We can remove the “k” and multiply those values by 1000 with the following code:

for col in ["2021", "2022"]:
    has_k = emissions[col].str.contains("k")
    values = emissions[col].str.replace("k", "")
    emissions[col] = np.where(has_k, values.astype(float) * 1000, values.astype(float))

And check that it worked:

emissions.query("country == 'China'")[["country",  "2020", "2021", "2022"]]
country 2020 2021 2022
32 China 9980.0 10600.0 10500.0

Table Section

Our goal is to create a table showing emissions for a few selected years and calculate absolute and relative changes.

  1. Subset the data to include Country, 2000, and 2022 columns only.
  2. Calculate an “Absolute Change” column as the difference between 2022 and 2000.
  3. Calculate a “Relative Change” column as the absolute change divided by the 2000 emissions, then multiplied by 100.
# Subset the data to include `country`, `2000`, and `2022` columns only.
table_df = emissions[['country', '2000', '2022']]


# Calculate absolute change as the difference between 2022 and 2000
table_df["Absolute Change"] = table_df['2022'] - table_df['2000']


# Calculate relative change as the absolute change divided by the 2000 emissions, then multiplied by 100
table_df["Relative Change"] = table_df["Absolute Change"]/table_df['2000']*100

# Round to 0 decimal places, and add a % sign to the relative change
table_df["Relative Change"] = table_df["Relative Change"].round(0).astype(str) + "%"

Now we can display this as an interactive table with itables:

show(table_df)
Loading ITables v2.4.2 from the internet... (need help?)

Chart Section

Our goal is to create a line chart from 1990 to 2022 for a few selected countries.

  1. Melt the original emissions dataset so that years become rows.
  2. Filter from 1990 to 2022 only.
  3. Choose 5 countries of your choice.
  4. Create a line chart showing emissions over time for the selected countries with Plotly Express.
# Melt the original `emissions` dataset. Your id_vars should be "country", your var_name should be "year" and your value_name should be "emissions".
emissions_long = emissions.melt(id_vars='country', var_name='year', value_name="emissions")

# Convert year to numeric using pd.to_numeric
emissions_long["year"] = pd.to_numeric(emissions_long['year'], errors='coerce')


# Convert emissions to numeric using pd.to_numeric. Here, we also convert dashes to the minus sign

emissions_long["year"] = pd.to_numeric(emissions_long['year'], errors='coerce')


# Query for years between 1990 and 2022 (that is 1990, 1991, ..., 2021, 2022)
emissions_long_1990_2022 = emissions_long.query('year >= 1990 and year <= 2022')


# Query for 5 countries (adjust these to any countries you like)
emissions_long_subset = emissions_long_1990_2022.query("country in ['Nigeria', 'UAE', 'Canada', 'Japan', 'China']")



# Create line chart. Year should be on the x-axis, emissions on the y-axis, and color should be by country.
fig_chart = px.line(emissions_long_subset, x='year', y='emissions', color='country')
fig_chart.show()

Mapping Section

This part is done for you.

Goal: Create a choropleth map showing global emissions from 1990 to 2022.

This will be animated by year.

  1. Ensure each country has a 3-letter ISO code. We’ll use country_converter for that.
  2. Create a map with px.choropleth and use animation_frame to show changes over time.
emissions_long_1990_2022 = emissions_long_1990_2022.replace({"UAE": "United Arab Emirates"})
# Convert country names to ISO3 codes
emissions_long_1990_2022["country_code"] = coco.convert(
    emissions_long_1990_2022["country"], to="ISO3"
)

fig_map = px.choropleth(
    emissions_long_1990_2022,
    locations="country_code",
    color="emissions",
    hover_name="country",
    animation_frame="year",
    title="Global CO2 Emissions (1990-2022)",
)

fig_map.show()

Final Tabset

Below, we place our results into a tabbed interface.

show(table_df)
Loading ITables v2.4.2 from the internet... (need help?)
fig_chart.show()
fig_map.show()

Deploying to GitHub Pages

As a final step, you should follow the steps outlined in the prework to deplioy your report to GitHub Pages. You will be asked to share a link to your report in the course portal